Overview

This weeks #TidyTuesday dataset contains data on R downloads over a single year from RStudio CRAN mirror between October 20, 2017 and October 20, 2018. Source: cran-logs.rstudio.com. The data contains date, time, size, version, operating system and country variables.

Load data

library(tidyverse)
library(stringr)
library(knitr)
library(here)
library(hrbrthemes)
library(gghighlight)
library(scales)
library(ggthemr)
library(waffle)
#Set theme as Dust using ggthemr
ggthemr(palette = "dust", layout = "clear", set_theme = TRUE)
#Import data from website
#Rdown <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018-10-30/r_downloads_year.csv")
Rdown <- read_csv(here("R_downloads.csv"))
kable(head(Rdown))
X1 X1_6 X1_5 X1_4 X1_3 X1_2 X1_1 date time size version os country ip_id
1 1 1 1 1 1 1 2017-10-23 14:29:18 78171332 3.4.2 win ES 1
2 2 2 2 2 2 2 2017-10-23 14:29:22 20692638 3.4.2 win PT 2
3 3 3 3 3 3 3 2017-10-23 14:29:57 972075 3.4.2 win PL 3
4 4 4 4 4 4 4 2017-10-23 14:30:00 1032203 3.0.3 win JP 4
5 5 5 5 5 5 5 2017-10-23 14:30:18 78171332 3.4.2 win CN 5
6 6 6 6 6 6 6 2017-10-23 14:30:50 64228612 3.4.2 osx US 6

Analysis/visualization

Which countries had the most R downloads?

#Limit to top 20 countries
Rdown_20  <- Rdown %>% group_by(country) %>% summarise(country_n = n()) %>% top_n(country_n,n=20)
bar <- ggplot(Rdown_20, aes(reorder(country, country_n), y = country_n)) + geom_bar(stat = "identity") + coord_flip() +
  scale_y_continuous(labels = comma_format()) + labs(
  x = " ",
  y = "Number of downloads",
  title = "Top 20 countries downloading R between October 20, 2017 and October 20, 2018",
  caption = "Source: cran-logs.rstudio.com")
bar

Which operating system was most commonly used?

#Summarise downloads by operating system
os  <- Rdown %>% group_by(os) %>% summarise(os_n = n()) %>% mutate(percent = (os_n/sum(os_n))*100)
os_percent <- c(`OSX` = 21, `Windows` = 74, `Src` = 5)

#Make a waffle chart
waffle <- waffle(os_percent, rows=8, colors = c("#969696", "#1879bf", "#db735c"), legend_pos = "bottom", xlab = NULL, title = "R downloads (%) by operating system\nOctober 20, 2017 and October 20, 2018")
waffle

When were the most common and least common dates for downloads?

#Use lubridate to extract month and day?
library(lubridate)
Rdown2 <- Rdown %>% mutate(mth = month(date)) %>% mutate(day = day(date))
Rdown_date <- Rdown2 %>% group_by(mth, day) %>% summarise(n()) %>% rename(count = `n()`)
hm <- ggplot(data = Rdown_date, aes(x = mth, y = day)) +
  geom_tile(aes(fill = count))
hm

library(plotly)
Rdown_date2 <- Rdown2 %>% group_by(date) %>% summarise(n()) %>% rename(count = `n()`)
line <- ggplot(Rdown_date2, aes(date, count)) + geom_line() + geom_smooth(se=FALSE, color = "steel blue") +
  scale_y_continuous(labels = comma_format()) + labs(
  x = " ",
  y = " ",
  title = "R downloads between October 20, 2017 and October 20, 2018",
  caption = "Source: cran-logs.rstudio.com"
  )
line

#Make an interactive version of the line chart
ggplotly(line)

Results

Between October 2017 and October 2018:

Discussion/conclusions

Downloads of R increased substantially in August 2018. Version 3.5.1 was released on 2 July 2018.

Package citations

Carson Sievert, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec and Pedro Despouy (2017). plotly: Create Interactive Web Graphics via ‘plotly.js’. R package version 4.7.1. https://CRAN.R-project.org/package=plotly

Bob Rudis (2018). hrbrthemes: Additional Themes, Theme Components and Utilities for ‘ggplot2’. R package version 0.5.0. https://CRAN.R-project.org/package=hrbrthemes

Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

Hadley Wickham (2018). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.3.0. https://CRAN.R-project.org/package=stringr

Yihui Xie (2018). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.20.

Bob Rudis and Dave Gandy (2017). waffle: Create Waffle Chart Visualizations in R. R package version 0.7.0. https://CRAN.R-project.org/package=waffle

Ciaran Tobin (NA). ggthemr: Themes for ggplot2. R package version 1.1.0.